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Background / Purpose: 

Description of prior research, intellectual context, and the focus of the research. 

Over the past several years, researeh teams have developed observational instruments to measure 
the quality of teachers’ instructional practices. Instruments such as Framework for Teaching 
(FFT) and the Classroom Assessment Scoring System (CLASS) assess general teaching 
practices, including student-teacher interactions, behavior management, and instructional 
pedagogy (Kane, Taylor, Tyler, & Wooten, 2011; Pianta, Belsky, Vandergrift, Houts, & 
Morrison, 2008). Other instruments such as the Protocol for Language Arts Teaching 
Observations (PLATO) and the Mathematical Quality of Instruction (MQI) attend to content- 
specific practices that are more pertinent for teaching and learning in specific disciplines 
(Grossman et al, 2012; Hill et al, 2008). Working at the intersection of both types of practices, 
we attempt to describe instruction using both generic and content-specific measures of teaching 
practice. As research teams have focused on the measurement properties of individual 
instruments, the extent to which generic and content-specific instruments capture related 
constructs is unclear. This research is of value to both researchers and school leaders, who might 
be interested in enumerating a parsimonious list of teaching practices that brings together generic 
and content-specific aspects of instruction; such a list could be useful for a number of purposes, 
including the creation of comprehensive evaluation frameworks and studies of the relationships 
between different domains of teaching practice and student learning. 

To our knowledge, only one study has begun to address this area of inquiry. The Measures of 
Effective Teaching Project collected data from teachers across six urban school districts on 
multiple observation instruments including the four listed above. Kane and Staiger (2012) found 
that items tended to cluster instrument to form up to three principal components. Using the same 
data and a factor analysis framework, McClellan and colleagues (2013) examined overlap 
between content-specific and general observation instruments, finding little and as many as 
twelve factors. At the same time, neither set of authors attempt to explore potential overlap 
between instruments through more complex factor structures, such as bi-factor models that 
attempt to account for instrument-specific variation. 

In this study we use exploratory and confirmatory factor analysis to examine scores of math 
instruction generated using two observation instruments, the MQI and CLASS, across a sample 
of over 300 fourth- or fifth-grade teachers. We attempt to answer the following two research 
questions: (1) How many unique dimensions of instruction do we measure? (2) To what extent is 
there overlap in the dimensions of instruction captured by these two instruments? 

Data / Participants: 

Description of the participants in the study: who, how many, key features, or characteristics. 

Our sample consists of fourth- and fifth-grade teachers from four school districts in the 2010- 
2011 and 2011-2012 school years. Schools were selected into the study based on district referrals 
and size; the study design required that schools have a minimum of two teachers in each of the 
sampled grades. Of eligible teachers, 309 (roughly 55%) agreed to participate. 
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Teachers’ mathematics lessons (n=l,362) were captured over a two-year period, with three 
lessons per teacher recorded each year. Videos were recorded using a three-camera, unmanned 
unit; site coordinators turned the camera on prior to the lesson and off at its conclusion. Most 
lessons lasted between 45 and 60 minutes. Teachers were allowed to choose the dates for capture 
in advance, and were directed to select typical lesson and exclude days on which students were 
taking a test. Although it is possible that these videotaped lessons are different from teachers’ 
general instruction, teachers did not have any incentive to select lessons strategically as no 
rewards or sanctions were involved with data collection. In addition, analyses from the Measures 
of Effective Teaching project indicate that teachers are ranked almost identically when they 
choose lessons to be observed compared to when lessons are chosen for them (Ho & Kane, 

2013). 

Trained raters scored these lessons on two established observational instruments: the MQI, 
which focuses on mathematics-specific practices, and the CLASS, which focuses on general 
teaching practices. Validity studies have shown that both instruments successfully capture the 
quality of teachers’ instruction, and specific dimensions from each instrument have been shown 
to relate to student outcomes (Bell, Gitomer, McCaffrey, Hamre, & Pianta, 2012; Blazar, 2014; 
Hill, Charalambous, & Kraft, 2012). For the MQI, two raters watched each lesson and scored 
teachers’ instruction on 14 items for each seven- and-a-half-minute segment on a scale from Low 
(1) to High (3). A single item. Classroom Work is Connected to Math, is scored as Not True (0) 
True (1). For the CLASS, one rater watched each lesson and scored teachers’ instruction on 12 
items for each fifteen-minute segment on a scale from Low (1) to High (7) (see Table 1 for a full 
list of items). Three items from the MQI {Major Errors, Language Imprecisions, and Lack of 
Clarity) and one from the CLASS {Negative Climate) have a negative valence and therefore were 
reversed coded in this analysis. For both instruments, raters had to complete an online training, 
pass a certification exam, and participate in ongoing calibration sessions. 

We used these data to create two datasets. The first is a teacher-level dataset with scores for each 
item on both the MQI and CLASS averaged across segments, lessons, and raters (for the MQI). 
The second is a segment-level dataset that captures the original scores assigned to each teacher 
by raters. For the MQI, we averaged scores across raters within a given segment to match the 
structure of the CLASS. Given that for any individual lesson there are twice as many segments 
for the MQI than for the CLASS, we assigned CLASS scores of the full fifteen-minute segment 
to the corresponding seven-and-a-half-minute segments from the MQI. 

Analysis: 

Description of the methods for collecting and analyzing data. 

To answer our research questions, we conducted three sets of analyses. We began by examining 
pairwise correlations of items across instruments. This allowed us to explore the degree of 
potential overlap in the dimensions of instruction captured by each instrument. Next, we 
conducted a set of exploratory factor analyses to identify the maximum number of factors we 
might expect to see, both within and across instruments. While we conducted these analyses 
combining data from both instruments, prior research suggests that we would not expect to see 
much overlap across instruments (McClellan et al, 2013). Therefore, we conducted a set of 
confirmatory factor analyses to account for sources of variance that had not been addressed in 
previous analyses. In particular, we utilized a bi-factor model to extract instrument-specific 
variation, and then tested factor structures that allowed items to cluster across instruments. 
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Findings / Results: 

Description of the main findings with specific details. 


(Please insert Table 1 here). In Table 1, we present teaeher-level correlations of items across 
instruments. Here, we find that some items on the MQI and CLASS are highly related. For 
example, Analysis and Problem Solving from CLASS is correlated with multiple items from the 
MQI {Multiple Methods, Use Student Productions, Student Explanations, Student Mathematical 
Questioning and Reasoning, and Enacted Task Cognitive Activation) above 0.3. Three items 
from the MQI - Language, Use Student Productions, and Student Mathematical Questioning and 
Reasoning (SMQR) - are correlated with all items from CLASS, though at lower magnitudes. 
This suggests that items from the two measures seem to be capturing somewhat similar facets of 
instruction and that factor structures might include factors with loadings across instruments 

(Please insert Table 2 here). At the same time, exploratory factor analyses of teacher-level scores 
indicate little overlap between instruments. In Table 2, we present eigenvalues and factor 
loadings for a parsimonious list of factors generated by focusing on the factors with eigenvalues 
larger than 1 (Kline, 1994). While not shown here, we also conduct separate factor analyses for 
individual instruments and school years, as well as for segment-level scores, and find similar 
factor structures across all analyses. Results indicate that four factors are needed to 
parsimoniously capture the observed variation in instruction, and these factors do not suggest any 
substantial crossover between instruments. Generally, items appear to load onto only one factor, 
with the only exception being the MQI item of Mathematical Language, which had relatively 
low loadings on two factors. Classroom Work is Connected to Math does not load strongly onto 
any factor, which may be due either to the unique scaling of this item or to the fact that this item 
does not reflect content-specific aspects of instruction. We therefore allowed this item to load 
freely on different generic or the content-specific factors. We label these four factors “Ambitious 
Mathematics Instruction,” “Mathematical Errors”, “Classroom Organization”, and “Classroom 
Climate and Support”, with the first two from the MQI and the latter two from the CLASS. 

(Please insert Table 3 here). In Table 3, we identify other potential model structures, beginning 
with a model using a single instructional factor and building towards a four- factor model similar 
to that identified by the exploratory factor analysis results. Models 1 through 5 do bot allow 
items to load across factors; however, we do explore models with items loading across 
instruments, which we capture in Model 6. Models 7 through 1 1 are bi-factor models that extract 
instrument-specific variation as well as theoretically driven instructional factors. Therefore, all 
items load onto two factors - one for the instrument on which they are scored and another for a 
particular instructional domain. Given the nested structure of the data, we run these models at 
both the teacher- and segment-level. Ideally, we would be able to fit a three-level model with 
segments nested within lessons, nested within teachers; however, due to non-convergence issues 
common to bi-factor models, we are only able to show results for a two-level model with 
segments nested within teachers. 

(Please insert Table 4 here). In Table 4, we present model fit indices for all of these models. 
Because most models are not nested, we cannot compare them based on formal statistical 
significance tests. Instead, we rely on criteria from the field and, in particular, on the AIC and 
BIG indices (Akaike, 1987; Kline, 2011). When we compare models using AIC and BIC indices. 
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we observe that a three- or four-factor structure best fits the data. In the one-level model with 
scores at the teacher level, the best- fitting model (i.e., those with smallest AIC and BIC values) is 
Model 9, which is a bi-factor model with three instructional factors: “Ambitious Instruction”, 
“Mathematical Errors”, and “Classroom Pedagogy”. “Mathematical Errors” consists solely of 
items from the MQI instrument, while the other two factors include items from both instruments. 
At the same time, once we extract variation for the MQI instrument, we do not observe 
substantial variation on the “Ambitious Instruction” factor. This may be because all but four 
items from the MQI are included in this factor. The second best-fitting model is Model 4, which 
matches results observed from exploratory factor analyses. Results from the two-level models 
also indicate that Model 4 has the best fit of the models that do not attempt to correct for 
instrument-specific variation. Because only one bi-factor, two-level model converged, we are not 
able to draw conclusions from this set of analyses. 

It is important to note that no individual model meets commonly accepted criteria for overall 
model fit. This likely is due to the fact that there are sources of variation that are not being 
modeled well (segments, lessons, raters, etc). Other reasons, such as non-normal item score 
distributions, also play a role. Einally, the purpose of using at most four factors was to be 
parsimonious, not to capture all of the variation in our data. 

Conclusions: 

Description of conclusions, recommendations, and limitations based on findings. 

Eor years, scholars have attended either to generic or to content-specific teaching practices, 
without systematic attempts to consider both types in tandem. Responding to older (e.g., Brophy) 
and more contemporary (e.g., Grossman & McDonald, 2008) calls to attend to both types of 
practices, in this study we explored the benefits that can be accrued by working at the 
intersection of generic and content-specific practices. Despite their obvious limitations, our 
results seem to be in favor of integrating the two types of practices: they suggest that, although 
there still seem to be some more content-specific factors (e.g. “Errors and Imprecision”) and 
some more generic teaching factors (e.g., “Pedagogy”), other factors combine generic and 
content-specific aspects of instruction (e.g., “Ambitious instruction”). 

This finding has implications for the measurement community. In this study, we found that items 
correlate both within and across instruments, and that, once we extracted instrument-specific 
variation - that might be attributed, among other things, to the different scales used in the two 
instruments - factors comprised of items from both instruments fit the data better than those that 
do not do so. Of course, as acknowledged above, we are limited in our ability to model the 
complexity of the data and, therefore, future work may attempt to do so. 

In addition, our findings could also inform policy around teacher education and professional 
development programs. Results highlight a parsimonious list of instructional factors - both 
general and content-specific - that, in turn, can be used in future work to explore which of these 
dimensions or combinations thereof matter most to student learning outcomes. Once validation 
studies link these dimensions to student learning, teacher preparation and professional 
development programs can target these areas of teacher practice as they prepare pre-service and 
in-service teachers for the complex work of teaching. 
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-4619.57 

-4742.78 

Sanq)le-Size Acfnsted BIC 

Smallest \htne 

-506J99 

-2816:2 

-1299.47 

-3220.08 

-4902.22 

-4896.86 

-3518J8 

-1663.28 

-1966.33 

-5106.77 

-19622 

-5094 52 

Oii-Sqnare Test of Model Fit 

>.05 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

RMSEA (Root Mean Square Error Of Approximation) 

<.05 

0.202 

0.158 

0.113 

0.148 

0.09 

0.091 

0.148 

0.103 

0.092 

0.084 

0.091 

0.086 

C3^ 

>.95 

0 

0.433 

0.712 

0.509 

0.819 

0.817 

0.513 

0.784 

0.828 

0.857 

0.833 

0.85 

S^MR (Standardized Root Mean Square ResidnaQ 

<.l 

0.301 

0.165 

0i>92 

0.184 

ao7i 

0.073 

0.146 

ao7 

a083 

ao7 

a069 

a 066 


IVo-Level Models (Segments Nested Wiflun Teadier) 


Akaike (AIC) 

Smallest ^^lue 

318885.2 

2923442 

2849373 

290766.1 

282835.8 

283 162.6 

288716.9 

282400.1 

Bayesian (BIC) 

Smallest \^lue 

319711.72 

2935 17 J 

286123.7 

291952.6 

2840883> 

284415.7 

289970 

283913.2 

Sanq)le-^e A($ustedBIC 

SmaUest^hlue 

319317.68 

292958 

285558.1 

291386.9 

283491.5 

283818.3 

289372.6 

283191.8 


Note: IWo-level models S-ll didoot converge. Nor did ar^ model wifli titiee levels (segemenls nested within lesson, wiflim teacher). Models fliat meet fit criterion are bolded. 
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